grasp synthesis
Lightning Grasp: High Performance Procedural Grasp Synthesis with Contact Fields
Yin, Zhao-Heng, Abbeel, Pieter
Despite years of research, real-time diverse grasp synthesis for dexterous hands remains an unsolved core challenge in robotics and computer graphics. We present Lightning Grasp, a novel high-performance procedural grasp synthesis algorithm that achieves orders-of-magnitude speedups over state-of-the-art approaches, while enabling unsupervised grasp generation for irregular, tool-like objects. The method avoids many limitations of prior approaches, such as the need for carefully tuned energy functions and sensitive initialization. This breakthrough is driven by a key insight: decoupling complex geometric computation from the search process via a simple, efficient data structure - the Contact Field. This abstraction collapses the problem complexity, enabling a procedural search at unprecedented speeds. We open-source our system to propel further innovation in robotic manipulation.
T(R,O) Grasp: Efficient Graph Diffusion of Robot-Object Spatial Transformation for Cross-Embodiment Dexterous Grasping
Fei, Xin, Xu, Zhixuan, Fang, Huaicong, Zhang, Tianrui, Shao, Lin
Figure 1: Given object point cloud and hand URDF, T (R,O) Grasp efficiently supports both conditioned and unconditioned grasp synthesis utilizing a graph diffusion model. Compared with D(R,O) Grasp [1], our method achieves superior performance with lower memory usage, significantly higher inference speed and throughput. Abstract-- Dexterous grasping remains a central challenge in robotics due to the complexity of its high-dimensional state and action space. We introduce T (R, O) Grasp, a diffusion-based framework that efficiently generates accurate and diverse grasps across multiple robotic hands. At its core is the T (R, O) Graph, a unified representation that models spatial transformations between robotic hands and objects while encoding their geometric properties. A graph diffusion model, coupled with an efficient inverse kinematics solver, supports both unconditioned and conditioned grasp synthesis. Extensive experiments on a diverse set of dexterous hands show that T (R, O) Grasp achieves average success rate of 94.83%, inference speed of 0.21s, and throughput of 41 grasps per second on an NVIDIA A100 40GB GPU, substantially outperforming existing baselines. In addition, our approach is robust and generalizable across embodiments while significantly reducing memory consumption. Grasping with dexterous hands is a fundamental capability for achieving precise, human-level manipulation. Y et, efficiently generating diverse and high-quality grasps remains a longstanding challenge, largely due to the high dimensionality of dexterous hands and the difficulty of ensuring both stability and precision.
Dexonomy: Synthesizing All Dexterous Grasp Types in a Grasp Taxonomy
Chen, Jiayi, Ke, Yubin, Peng, Lin, Wang, He
Generalizable dexterous grasping with suitable grasp types is a fundamental skill for intelligent robots. Developing such skills requires a large-scale and high-quality dataset that covers numerous grasp types (i.e., at least those categorized by the GRASP taxonomy), but collecting such data is extremely challenging. Existing automatic grasp synthesis methods are often limited to specific grasp types or object categories, hindering scalability. This work proposes an efficient pipeline capable of synthesizing contact-rich, penetration-free, and physically plausible grasps for any grasp type, object, and articulated hand. Starting from a single human-annotated template for each hand and grasp type, our pipeline tackles the complicated synthesis problem with two stages: optimize the object to fit the hand template first, and then locally refine the hand to fit the object in simulation. To validate the synthesized grasps, we introduce a contact-aware control strategy that allows the hand to apply the appropriate force at each contact point to the object. Those validated grasps can also be used as new grasp templates to facilitate future synthesis. Experiments show that our method significantly outperforms previous type-unaware grasp synthesis baselines in simulation. Using our algorithm, we construct a dataset containing 10.7k objects and 9.5M grasps, covering 31 grasp types in the GRASP taxonomy. Finally, we train a type-conditional generative model that successfully performs the desired grasp type from single-view object point clouds, achieving an 82.3% success rate in real-world experiments. Project page: https://pku-epic.github.io/Dexonomy.
SPGrasp: Spatiotemporal Prompt-driven Grasp Synthesis in Dynamic Scenes
Mei, Yunpeng, Cao, Hongjie, Xia, Yinqiu, Xiao, Wei, Feng, Zhaohan, Wang, Gang, Chen, Jie
Real-time interactive grasp synthesis for dynamic objects remains challenging as existing methods fail to achieve low-latency inference while maintaining promptability. To bridge this gap, we propose SPGrasp (spatiotemporal prompt-driven dynamic grasp synthesis), a novel framework extending segment anything model v2 (SAMv2) for video stream grasp estimation. Our core innovation integrates user prompts with spatiotemporal context, enabling real-time interaction with end-to-end latency as low as 59 ms while ensuring temporal consistency for dynamic objects. In benchmark evaluations, SPGrasp achieves instance-level grasp accuracies of 90.6% on OCID and 93.8% on Jacquard. On the challenging GraspNet-1Billion dataset under continuous tracking, SPGrasp achieves 92.0% accuracy with 73.1 ms per-frame latency, representing a 58.5% reduction compared to the prior state-of-the-art promptable method RoG-SAM while maintaining competitive accuracy. Real-world experiments involving 13 moving objects demonstrate a 94.8% success rate in interactive grasping scenarios. These results confirm SPGrasp effectively resolves the latency-interactivity trade-off in dynamic grasp synthesis.
Task-Oriented Human Grasp Synthesis via Context- and Task-Aware Diffusers
Liu, An-Lun, Chao, Yu-Wei, Chen, Yi-Ting
In this paper, we study task-oriented human grasp synthesis, a new grasp synthesis task that demands both task and context awareness. At the core of our method is the task-aware contact maps. Unlike traditional contact maps that only reason about the manipulated object and its relation with the hand, our enhanced maps take into account scene and task information. This comprehensive map is critical for hand-object interaction, enabling accurate grasping poses that align with the task. We propose a two-stage pipeline that first constructs a task-aware contact map informed by the scene and task. In the subsequent stage, we use this contact map to synthesize task-oriented human grasps. We introduce a new dataset and a metric for the proposed task to evaluate our approach. Our experiments validate the importance of modeling both scene and task, demonstrating significant improvements over existing methods in both grasp quality and task performance. See our project page for more details: https://hcis-lab.github.io/TOHGS/
- North America > United States (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Dex1B: Learning with 1B Demonstrations for Dexterous Manipulation
Ye, Jianglong, Wang, Keyi, Yuan, Chengjing, Yang, Ruihan, Li, Yiquan, Zhu, Jiyue, Qin, Yuzhe, Zou, Xueyan, Wang, Xiaolong
Generating large-scale demonstrations for dexterous hand manipulation remains challenging, and several approaches have been proposed in recent years to address this. Among them, generative models have emerged as a promising paradigm, enabling the efficient creation of diverse and physically plausible demonstrations. In this paper, we introduce Dex1B, a large-scale, diverse, and high-quality demonstration dataset produced with generative models. The dataset contains one billion demonstrations for two fundamental tasks: grasping and articulation. To construct it, we propose a generative model that integrates geometric constraints to improve feasibility and applies additional conditions to enhance diversity. We validate the model on both established and newly introduced simulation benchmarks, where it significantly outperforms prior state-of-the-art methods. Furthermore, we demonstrate its effectiveness and robustness through real-world robot experiments. Our project page is at https://jianglongye.com/dex1b
BG-HOP: A Bimanual Generative Hand-Object Prior
Krishna, Sriram, Chittupalli, Sravan, Park, Sungjae
In this work, we present BG-HOP, a generative prior that seeks to model bimanual hand-object interactions in 3D. W e address the challenge of limited bimanual interaction data by extending existing single-hand generative priors, demonstrating preliminary results in capturing the joint distribution of hands and objects. Our experiments showcase the model's capability to generate bimanual interactions and synthesize grasps for given objects. W e make code and models publicly available.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia (0.04)
Modeling Dynamic Hand-Object Interactions with Applications to Human-Robot Handovers
Humans frequently grasp, manipulate, and move objects. Interactive systems assist humans in these tasks, enabling applications in Embodied AI, human-robot interaction, and virtual reality. However, current methods in hand-object synthesis often neglect dynamics and focus on generating static grasps. The first part of this dissertation introduces dynamic grasp synthesis, where a hand grasps and moves an object to a target pose. We approach this task using physical simulation and reinforcement learning. We then extend this to bimanual manipulation and articulated objects, requiring fine-grained coordination between hands. In the second part of this dissertation, we study human-to-robot handovers. We integrate captured human motion into simulation and introduce a student-teacher framework that adapts to human behavior and transfers from sim to real. To overcome data scarcity, we generate synthetic interactions, increasing training diversity by 100x. Our user study finds no difference between policies trained on synthetic vs. real motions.
- Asia (0.27)
- North America > United States > California > San Francisco County > San Francisco (0.13)
- Europe > France (0.13)
- Research Report > Promising Solution (1.00)
- Questionnaire & Opinion Survey (1.00)
- Overview (1.00)
- Research Report > New Finding (0.92)
- Education (1.00)
- Energy > Oil & Gas > Upstream (0.67)
- Leisure & Entertainment > Games > Computer Games (0.46)
BODex: Scalable and Efficient Robotic Dexterous Grasp Synthesis Using Bilevel Optimization
Chen, Jiayi, Ke, Yubin, Wang, He
Robotic dexterous grasping is a key step toward human-like manipulation. To fully unleash the potential of data-driven models for dexterous grasping, a large-scale, high-quality dataset is essential. While gradient-based optimization offers a promising way for constructing such datasets, existing works suffer from limitations, such as restrictive assumptions in energy design or limited experiments on small object sets. Moreover, the lack of a standard benchmark for comparing synthesis methods and datasets hinders progress in this field. To address these challenges, we develop a highly efficient synthesis system and a comprehensive benchmark with MuJoCo for dexterous grasping. Our system formulates grasp synthesis as a bilevel optimization problem, combining a novel lower-level quadratic programming (QP) with an upper-level gradient descent process. By leveraging recent advances in CUDA-accelerated robotic libraries and GPU-based QP solvers, our system can parallelize thousands of grasps and synthesize over 49 grasps per second on a single NVIDIA 3090 GPU. Our synthesized grasps for Shadow Hand and Allegro Hand achieve a success rate above 75% in MuJoCo, with a penetration depth and contact distance of under 1 mm, outperforming existing baselines on nearly all metrics. Compared to the previous large-scale dataset, DexGraspNet, our dataset significantly improves the performance of learning models, with a simulation success rate from around 40% to 80%. Real-world testing of the trained model on the Shadow Hand achieves an 81% success rate across 20 diverse objects.
- North America > United States (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Information Technology > Artificial Intelligence > Robots > Manipulation (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Diffusion for Multi-Embodiment Grasping
Freiberg, Roman, Qualmann, Alexander, Vien, Ngo Anh, Neumann, Gerhard
Grasping is a fundamental skill in robotics with diverse applications across medical, industrial, and domestic domains. However, current approaches for predicting valid grasps are often tailored to specific grippers, limiting their applicability when gripper designs change. To address this limitation, we explore the transfer of grasping strategies between various gripper designs, enabling the use of data from diverse sources. In this work, we present an approach based on equivariant diffusion that facilitates gripper-agnostic encoding of scenes containing graspable objects and gripper-aware decoding of grasp poses by integrating gripper geometry into the model. We also develop a dataset generation framework that produces cluttered scenes with variable-sized object heaps, improving the training of grasp synthesis methods. Experimental evaluation on diverse object datasets demonstrates the generalizability of our approach across gripper architectures, ranging from simple parallel-jaw grippers to humanoid hands, outperforming both single-gripper and multi-gripper state-of-the-art methods.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)